This paper considers the minimization of a general objective function $f(X)$over the set of non-square $n\times m$ matrices where the optimal solution$X^\star$ is low-rank. To reduce the computational burden, we factorize thevariable $X$ into a product of two smaller matrices and optimize over these twomatrices instead of $X$. Despite the resulting nonconvexity, recent studies inmatrix completion and sensing have shown that the factored problem has nospurious local minima and obeys the so-called strict saddle property (thefunction has a directional negative curvature at all critical points but localminima). We analyze the global geometry for a general and yet well-conditionedobjective function $f(X)$ whose restricted strong convexity and restrictedstrong smoothness constants are comparable. In particular, we show that thereformulated objective function has no spurious local minima and obeys thestrict saddle property. These geometric properties implies that a number ofiterative optimization algorithms (such as gradient descent) can provably solvethe factored problem with global convergence.
展开▼
机译:本文考虑了在最优解$ X ^ \ star $是低秩的非平方$ n \乘以m $矩阵的集合上,通用目标函数$ f(X)$的最小化。为了减少计算负担,我们将变量$ X $分解为两个较小矩阵的乘积,并针对这两个矩阵而不是$ X $进行优化。尽管产生了非凸性,但最近对矩阵完成和感测的研究表明,分解后的问题具有无误的局部极小值,并服从所谓的严格鞍形特性(该函数在除局部极小值以外的所有关键点上都具有方向性负曲率)。我们分析了一个通用且条件良好的目标函数$ f(X)$的全局几何,该函数的受限制的强凸度和受限制的强平滑度常数是可比的。特别地,我们表明,重新形成的目标函数没有虚假的局部最小值,并且服从了严格的鞍性质。这些几何特性表明,许多迭代优化算法(例如梯度下降)可以证明具有全局收敛性的分解问题。
展开▼